target audience
- Asia > Bhutan (0.05)
- North America > United States > California (0.04)
- Africa > Sudan (0.04)
- Africa > Middle East > Egypt (0.04)
- Banking & Finance > Economy (1.00)
- Education > Educational Setting (0.70)
- Asia > Bhutan (0.05)
- North America > United States > California (0.04)
- Africa > Sudan (0.04)
- Africa > Middle East > Egypt (0.04)
Towards Automatic Evaluation for Image Transcreation
Khanuja, Simran, Iyer, Vivek, He, Claire, Neubig, Graham
Beyond conventional paradigms of translating speech and text, recently, there has been interest in automated transcreation of images to facilitate localization of visual content across different cultures. Attempts to define this as a formal Machine Learning (ML) problem have been impeded by the lack of automatic evaluation mechanisms, with previous work relying solely on human evaluation. In this paper, we seek to close this gap by proposing a suite of automatic evaluation metrics inspired by machine translation (MT) metrics, categorized into: a) Object-based, b) Embedding-based, and c) VLM-based. Drawing on theories from translation studies and real-world transcreation practices, we identify three critical dimensions of image transcreation: cultural relevance, semantic equivalence and visual similarity, and design our metrics to evaluate systems along these axes. Our results show that proprietary VLMs best identify cultural relevance and semantic equivalence, while vision-encoder representations are adept at measuring visual similarity. Meta-evaluation across 7 countries shows our metrics agree strongly with human ratings, with average segment-level correlations ranging from 0.55-0.87. Finally, through a discussion of the merits and demerits of each metric, we offer a robust framework for automated image transcreation evaluation, grounded in both theoretical foundations and practical application. Our code can be found here: https://github.com/simran-khanuja/automatic-eval-transcreation
TapeAgents: a Holistic Framework for Agent Development and Optimization
Bahdanau, Dzmitry, Gontier, Nicolas, Huang, Gabriel, Kamalloo, Ehsan, Pardinas, Rafael, Piché, Alex, Scholak, Torsten, Shliazhko, Oleh, Tremblay, Jordan Prince, Ghanem, Karam, Parikh, Soham, Tiwari, Mitul, Vohra, Quaizar
We present TapeAgents, an agent framework built around a granular, structured log tape of the agent session that also plays the role of the session's resumable state. In TapeAgents we leverage tapes to facilitate all stages of the LLM Agent development lifecycle. The agent reasons by processing the tape and the LLM output to produce new thought and action steps and append them to the tape. The environment then reacts to the agent's actions by likewise appending observation steps to the tape. By virtue of this tape-centred design, TapeAgents can provide AI practitioners with holistic end-to-end support. At the development stage, tapes facilitate session persistence, agent auditing, and step-by-step debugging. Post-deployment, one can reuse tapes for evaluation, fine-tuning, and prompt-tuning; crucially, one can adapt tapes from other agents or use revised historical tapes. In this report, we explain the TapeAgents design in detail. We demonstrate possible applications of TapeAgents with several concrete examples of building monolithic agents and multi-agent teams, of optimizing agent prompts and finetuning the agent's LLM. We present tooling prototypes and report a case study where we use TapeAgents to finetune a Llama-3.1-8B form-filling assistant to perform as well as GPT-4o while being orders of magnitude cheaper. Lastly, our comparative analysis shows that TapeAgents's advantages over prior frameworks stem from our novel design of the LLM agent as a resumable, modular state machine with a structured configuration, that generates granular, structured logs and that can transform these logs into training text -- a unique combination of features absent in previous work.
- Research Report (0.50)
- Workflow (0.46)
- Education (0.46)
- Banking & Finance (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Measuring and Improving Persuasiveness of Large Language Models
Singh, Somesh, Singla, Yaman K, SI, Harini, Krishnamurthy, Balaji
LLMs are increasingly being used in workflows involving generating content to be consumed by humans (e.g., marketing) and also in directly interacting with humans (e.g., through chatbots). The development of such systems that are capable of generating verifiably persuasive messages presents both opportunities and challenges for society. On the one hand, such systems could positively impact domains like advertising and social good, such as addressing drug addiction, and on the other, they could be misused for spreading misinformation and shaping political opinions. To channel LLMs' impact on society, we need to develop systems to measure and benchmark their persuasiveness. With this motivation, we introduce PersuasionBench and PersuasionArena, the first large-scale benchmark and arena containing a battery of tasks to measure the persuasion ability of generative models automatically. We investigate to what extent LLMs know and leverage linguistic patterns that can help them generate more persuasive language. Our findings indicate that the persuasiveness of LLMs correlates positively with model size, but smaller models can also be made to have a higher persuasiveness than much larger models. Notably, targeted training using synthetic and natural datasets significantly enhances smaller models' persuasive capabilities, challenging scale-dependent assumptions. Our findings carry key implications for both model developers and policymakers. For instance, while the EU AI Act and California's SB-1047 aim to regulate AI models based on the number of floating point operations, we demonstrate that simple metrics like this alone fail to capture the full scope of AI's societal impact. We invite the community to explore and contribute to PersuasionArena and PersuasionBench, available at https://bit.ly/measure-persuasion, to advance our understanding of AI-driven persuasion and its societal implications.
- North America > United States > California (0.34)
- North America > United States > Utah (0.14)
- North America > United States > Kentucky (0.04)
- (9 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
SpeciaLex: A Benchmark for In-Context Specialized Lexicon Learning
Imperial, Joseph Marvin, Madabushi, Harish Tayyar
Specialized lexicons are collections of words with associated constraints such as special definitions, specific roles, and intended target audiences. These constraints are necessary for content generation and documentation tasks (e.g., writing technical manuals or children's books), where the goal is to reduce the ambiguity of text content and increase its overall readability for a specific group of audience. Understanding how large language models can capture these constraints can help researchers build better, more impactful tools for wider use beyond the NLP community. Towards this end, we introduce SpeciaLex, a benchmark for evaluating a language model's ability to follow specialized lexicon-based constraints across 18 diverse subtasks with 1,285 test instances covering core tasks of Checking, Identification, Rewriting, and Open Generation. We present an empirical evaluation of 15 open and closed-source LLMs and discuss insights on how factors such as model scale, openness, setup, and recency affect performance upon evaluating with the benchmark.
- Asia > Singapore (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > Philippines (0.04)
- (7 more...)
- Leisure & Entertainment (0.68)
- Aerospace & Defense (0.46)
Bridging Background Knowledge Gaps in Translation with Automatic Explicitation
Han, HyoJung, Boyd-Graber, Jordan Lee, Carpuat, Marine
Translations help people understand content written in another language. However, even correct literal translations do not fulfill that goal when people lack the necessary background to understand them. Professional translators incorporate explicitations to explain the missing context by considering cultural differences between source and target audiences. Despite its potential to help users, NLP research on explicitation is limited because of the dearth of adequate evaluation methods. This work introduces techniques for automatically generating explicitations, motivated by WikiExpl: a dataset that we collect from Wikipedia and annotate with human translators. The resulting explicitations are useful as they help answer questions more accurately in a multilingual question answering framework.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > France (0.05)
- Asia > Middle East > Jordan (0.04)
- (26 more...)
- Media (0.67)
- Government (0.46)
Council Post: Artificial Intelligence: How To Turn Conversational AI Into A Success Business
Boris Kontsevoi is a technology executive, President and CEO of Intetics Inc., a global software engineering and data processing company. AI used to be the stuff of sci-fi movies, but now it's all around us--computer vision and chatbots have become part of the standard business processes. Recently, artificial intelligence has reached its peak and made a breakthrough that has affected almost every industry, from high tech, telecoms, finance and healthcare to pharmaceuticals. The global AI market is expected to grow by more than $500 billion between now and 2030, according to various studies. IDC, a market research firm, predicted that the AI market will be worth over $500 billion by 2024.
- Health & Medicine (0.57)
- Media (0.50)
- Information Technology (0.36)
20 Best Content Marketing Tools
Today's consumers are thirsty for great content. But crafting compelling content and pushing it out to the right platforms is often easier said than done. Content marketing is many jobs rolled into one; let's admit it, we could all use a little help! This is where content marketing tools come in. From grammar-checkers to AI-driven software and content-optimizing tools, there are tons of handy options to help your content sing.
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Information Management > Search (0.95)
- Information Technology > Artificial Intelligence (0.70)
How AI and ML Are Reshaping Customer Experiences - Tech News
No longer the stuff of science fiction, artificial intelligence (AI) and machine learning (ML) are revolutionizing the way customers interact with brands. Businesses that have embraced these technologies can reshape the customer experience, curate one-of-a-kind buyer journeys, and strengthen bonds with their target audiences. As your organization works to remain competitive in the modern business ecosystem, it must tap into the power of AI and ML technologies to provide a superior customer experience. Artificial intelligence and machine learning solutions can profoundly impact every facet of the customer experience. Customers who interact with your brand are looking for a personalized experience.